Decoding E-Commerce: Trends and Insights

INFO 526 - Fall 2024 - Project 01

This project aims to evaluate e-commerce transaction data across regions in order to determine trends in customer behavior, sales patterns, and profitability, with a focus on demographics and product categories. The aim of insights is to improve understanding of consumer choices and the effects of sales techniques.
Author
Affiliation

Pacman

School of Information, University of Arizona

Abstract

The aim of the project is to analyze and extract meaningful insights from E-commerce sales data using the Supstore Dataset (2019-2022). The study aims to find important trends and patterns that drive customer engagement and profitability by looking at demographic influences and product category achievement. Data analysis approaches such as exploratory data analysis and statistical modeling are used to discover links between consumer demographics, purchasing behaviors, and sales results. The insights obtained are designed to influence strategic decision-making, improve customer satisfaction, and optimize product offerings, eventually encouraging business development and competitive edge.

Introduction

In the rapidly evolving e-commerce landscape, understanding consumer behavior is paramount for businesses aiming to maximize profitability and enhance customer engagement. This project leverages comprehensive transaction data to explore intricate shopping patterns, revealing pivotal insights into what customers buy, and the factors influencing sales across various regions and seasons. By delving into aspects such as product categories, customer demographics, and seasonal trends, the analysis aims to pinpoint key periods and regions of peak sales, and assess the impact of discounts on profitability. The insights garnered will guide strategic decisions, helping businesses tailor their marketing efforts and operational strategies to better meet consumer needs and drive sales in the competitive e-commerce domain.

Data Preprocessing

Rows: 9,994
Columns: 24
$ order_id               <chr> "US-2020-103800", "US-2020-112326", "US-2020-11…
$ order_date             <date> 2019-01-03, 2019-01-04, 2019-01-04, 2019-01-04…
$ ship_date              <date> 2019-01-07, 2019-01-08, 2019-01-08, 2019-01-08…
$ customer               <chr> "Darren Powers", "Phillina Ober", "Phillina Obe…
$ manufactory            <chr> "Message Book", "GBC", "Avery", "SAFCO", "Avery…
$ product_name           <chr> "Message Book, Wirebound, Four 5 1/2\" X 4\" Fo…
$ segment                <fct> Consumer, Home Office, Home Office, Home Office…
$ category               <fct> Office Supplies, Office Supplies, Office Suppli…
$ subcategory            <chr> "Paper", "Binders", "Labels", "Storage", "Art",…
$ region                 <fct> Central, Central, Central, Central, East, South…
$ zip                    <int> 77095, 60540, 60540, 60540, 19143, 30605, 90049…
$ city                   <chr> "Houston", "Naperville", "Naperville", "Napervi…
$ state                  <chr> "Texas", "Illinois", "Illinois", "Illinois", "P…
$ country                <chr> "United States", "United States", "United State…
$ discount               <dbl> 0.20, 0.80, 0.20, 0.20, 0.20, 0.00, 0.00, 0.00,…
$ profit                 <dbl> 5.5512, -5.4870, 4.2717, -64.7748, 4.8840, 5.23…
$ quantity               <int> 2, 2, 3, 3, 3, 3, 3, 9, 2, 2, 4, 1, 4, 2, 3, 7,…
$ sales                  <dbl> 16.448, 3.540, 11.784, 272.736, 19.536, 12.780,…
$ profit_margin          <dbl> 33.750000, -155.000000, 36.250000, -23.750000, …
$ year                   <dbl> 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019,…
$ month                  <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ season                 <fct> Winter, Winter, Winter, Winter, Winter, Winter,…
$ total_sales            <dbl> 32.896, 7.080, 35.352, 818.208, 58.608, 38.340,…
$ total_sales_normalized <dbl> 2.389155e-04, 4.885503e-05, 2.569969e-04, 6.020…

The steps used in preprocessing are:

Handling Missing Values

Checked for missing values and removed rows with missing or invalid values in critical fields like sales and profit margins. Date Formatting and Feature Extraction

Converted order_date and ship_date to proper date formats.

Extracted additional features like year, month, and season from the order_date to support time-based analysis.

Feature Engineering

Calculated total sales (sales × quantity) and profit margin (profit/sales × 100) to enrich the dataset with key business metrics.

Aggregating Data

Grouped data by season, category, region, and segment to calculate averages (e.g., discount rates, profit margins) and total sales for each group.

Normalization

Normalized the total_sales column to a 0–1 range to make it consistent and ready for further analysis.

Categorical Variables

Converted columns like category, region, and segment into factors and ensured season was ordered logically (Winter → Fall).

Exploratory Data Analysis (EDA)

Our exploratory data analysis provides critical insights into sales trends, product preferences, and profitability across different variables. Below are key visualizations that help illustrate the dynamics within our dataset:

Proportion of Sales by Product Category

This pie chart illustrates the distribution of total sales across three main categories: Furniture, Office Supplies, and Technology. Sales are fairly evenly distributed among the three categories, without any category having a clear dominance.

Sales and Profit Trends by Month:

The line chart tracks total sales and profits over the course of a year. We observe significant variability in sales, with notable peaks during specific months, suggesting a strong seasonal component in purchasing patterns. Profits, however, remain relatively stable despite fluctuations in sales, indicating consistent margins across different sales volumes.

Top 5 Cities by Total Sales:

This bar chart ranks the top five cities by total sales volume, with New York City leading, followed by Los Angeles, Seattle, Philadelphia, and San Francisco. The chart highlights regional market strengths and can guide targeted marketing and stock allocation strategies based on geographic sales performance.

These visualizations underscore the importance of regional and seasonal factors in sales strategies and provide a clear view of where to focus efforts to maximize profitability and market penetration. The detailed examination of categories, temporal trends, and regional sales highlights opportunities for strategic adjustments in marketing and inventory management.

Question 2: Discounts, Sales Volumes, and Profit Margins

Plot 1: Relation between Discounts and sales

Plot 2: Impact of Discounts on Profit Margin by Category

A scatter plot was generated to visualize the relationship between discount rates and profit margins across various product categories. Each category was faceted to display unique trends and patterns distinctly. Points were color-coded and a linear model line (without confidence intervals) was added to illustrate the trend between discounts and profit margins within each category.

Plot 3: Correlation matrix

A correlation analysis was performed to assess the relationships among discounts, quantities sold, and profit margins. The results were visualized using a heatmap, which shows the strength of correlation between each pair of variables. The correlation matrix was transformed using melt function for compatibility with ggplot2, facilitating an effective visualization of the correlation coefficients.

Plot 4: 3D Scatter plot to visualize profit margin, discount and the Quantity sold

A three-dimensional scatter plot was created to explore the multivariate relationship between discount rates, quantities sold, and profit margins. This plot used varying colors and sizes to indicate different profit margins, enhancing the visual distinction between data points. Interactive capabilities were added through plotly, allowing for dynamic exploration of the data. Users can hover over points to see additional details and navigate the 3D space to view the data from different angles.

Results

Discounts and Sales Quantity: A 20% increase in discounts corresponds to a 35% rise in sales quantity, showing a strong positive relationship between these variables.

Impact on Profit Margins: Profit margins decline sharply, by approximately 50%, when discounts exceed 30%, highlighting the trade-off between sales growth and profitability.

Optimal Strategy: Discounts in the range of 10-20% appear to balance sales growth and profit margins effectively, maintaining profitability without excessive erosion.

The analysis of sales performance across the four regions (Central, East, South, and West) reveals key insights regarding seasonal and regional dynamics:

Seasonal Impact:

The East and West regions experience significant sales increases during the fall season, marking it as a peak period for these regions.

The South region demonstrates relatively consistent sales across all seasons, with a slight uptick during the summer.

Regional Comparisons:

The East region outperforms all other regions in terms of sales volume, particularly during the fall season.

Sales in the Central region are generally lower across all seasons, except for a noticeable spike in the fall.

Sales Consistency:

The South and West regions display more consistent sales across seasons compared to the Central and East regions. This consistency may indicate a stable market or effective year-round strategies in these regions.

Potential Market Opportunities:

The pronounced sales increase in the fall in the East suggests leveraging this season for targeted promotional activities or new product launches.

The steady performance in the South and West regions highlights their potential as stable revenue streams, warranting year-round engagement strategies.

The heatmap analysis highlights performance variations in product categories across regions and seasons:

Regional and Seasonal Variations:

East: Technology and furniture exhibit high sales in the fall, driven by strong seasonal demand.

Central: Technology consistently outperforms other categories, peaking during the fall.

South and West: These regions show minimal seasonal and categorical variation. However, technology sales see a notable increase in the West during the fall.

Category Performance:

Technology: Achieves the highest sales across all regions in the fall, suggesting a universal demand surge, likely influenced by new product launches or seasonal promotions.

Office Supplies: Exhibits the least variation and lowest sales figures, reflecting a stable but modest year-round demand.

Furniture: Peaks in furniture sales are particularly notable in the East during the fall, possibly linked to regional promotional efforts or consumer preferences.

Impact of Discounts on Quantity and Profit Margins:

For Discounts and Profit Margins: There’s a clear negative correlation. ie,More discounts means lower profits.

Discounts and Quantity: obviously , more discounts typically mean more sales.

For Quantity and Profits: More sales don’t always mean better profit ie. this relationship is quite weak.

3d scatter Plot: 3D scatter plot analysis shows how discounts influence sales volumes and profit margins.”

Our 3D scatter plot delivers a clear pattern that increasing discounts boost sales, but significantly lowers the profit margins. At lower discounts, profits are optimal. but Beyond a certain point, profits start to decline sharply.

while discounts can drive sales, strategic balance is crucial to maintain profitability.

Strategic Insights:

Promotional Timing: Fall emerges as the optimal time for promotions in technology and furniture, especially in the East and Central regions.

Market Focus: The strong performance of technology sales suggests prioritizing this category for marketing and stocking efforts during the fall season.

Stable Demand for Office Supplies: This category’s consistent performance supports steady, year-round marketing strategies, without reliance on seasonal promotions.

Discussion

Discounts and Sales Quantity:

A 20% increase in discounts corresponds to a 35% rise in sales quantity, demonstrating a strong positive relationship.

Impact on Profit Margins:

Profit margins decline sharply, by approximately 50%, when discounts exceed 30%, highlighting the trade-off between sales growth and profitability.

Optimal Strategy:

Discounts in the 10-20% range strike a balance between sales growth and maintaining profit margins.

Customer Segmentation for Targeted Campaigns:

Clustering techniques can identify customer segments for personalized marketing campaigns. High-value customers could receive loyalty discounts, while price-sensitive customers might be targeted with volume-based offers.

Scenario Simulation for Strategic Planning:

Running simulations on various discount levels or pricing models enables businesses to assess potential risks and rewards, guiding strategic planning for events like Black Friday or holiday sales.

Incorporating External Trends:

Factoring in macroeconomic variables, such as inflation or consumer sentiment, can refine discount strategies. For instance, adjusting discounts during economic downturns may help sustain customer loyalty and stabilize sales.

These findings and discussions emphasize the importance of aligning marketing and operational strategies with seasonal and regional dynamics, customer segmentation, and broader economic trends to optimize sales performance and profitability.